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Abstract 


We consider a variant of the well-known Gauss-Seidel method for the so- 
lution of Markov chains in steady state. Whereas the standard algorithm 
visits each state exactly once per iteration in a pre-determined order, the 
alternative approach uses a dynamic strategy. A set of states to be visited 
is maintained which can grow and shrink as the computation progresses. In 
this manner, we hope to concentrate the computational work in those areas 
of the chain in which maximum improvement in the solution can be achieved. 
We consider the adaptive approach both as a solver in its own right and as 
a relaxation method within the multi-level algorithm. Experimental results 
show significant computational savings in both cases. 
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1. Introduction 


We are interested in developing efficient methods for computing steady-state solutions of large 
continuous-time Markov chains. In particular we are interested in new, improved algorithms for 
general chains with which solutions can he obtained with substantially less effort than using the 
standard schemes. We consider in this paper the adaptive Gauss-Seidel (AGS) method as a variant 
of the well-known Gauss-Seidel algorithm (GS) in the form in which it is usually implemented. Our 
adaptive relaxation method is based on that of Rude [7]. Adaptive relaxation dispenses with the 
statically ordered processing of states in favour of a dynamic strategy. By making an appropriate 
choice of nodes to visit, it is hoped that computations that will have little effect on the solution 
can be spared and the attention be concentrated on those areas of the solution vector where the 
solution can most efficiently be improved. 

We discuss adaptive Gauss-Seidel in two different roles. First we consider it as a solution method 
in its own right, i.e. as a direct alternative to the standard GS scheme. Second we consider its use 
as a relaxation method within the recently introduced multi-level algorithm [2]. It is shown that 
AGS acquires a particular meaning in this context. 

In section 2 we give the problem statement and introduce some notation. In section 3 we 
describe and discuss the adaptive Gauss-Seidel algorithm. In section 4 we briefly state the multi- 
level method and show the particular meaning of AGS in this context. In section 5 results of 
numerical experiments are presented showing the performance of the Gauss-Seidel and multi-level 
algorithms both with and without the adaptive modification. It will be shown that the adaptive 
approach can lead to improved performance in both cases. In the final section we summarize the 
paper. 

2. Problem Description and Aggregation Equations 

Consider a Markov chain consisting of n states sq...s u ^\. Denote the unknown vector by p, 
where p t is the probability of being in state s,-. 

We then have to solve the system of equations 


Pp = 0 


with the additional condition 

i ~ ?i — l 

J2 = 1 
»=0 

Note that this equation is usually written as nQ = 0 for tt = p T and Q = P T , where Q is the 
infinitesimal generator matrix. 

A coarser representation of the Markov chain described by matrix P may be obtained by ag- 
gregation. This means creating a new Markov chain described by a matrix Q with the vector of 
state probabilities q, each of whose N states 5o . . . 5V - 1 is derived from a number of states of the 
original system. Figure 1 illustrates the situation for an eight-state Markov chain P, where states 
are aggregated in pairs to form a four-state coarser level system Q. 

In the following we will use the terms fine level and coarse level to refer to Markov chains where 
the latter is obtained by aggregation from the former. The relation s* € 5, signifies that the fine 
level state s * is mapped by the aggregation operation to the coarse level state .9,. 
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Figure 1: Aggregation of Markov Chains 


The matrix Q of the aggregated system is chosen as follows : 


Qji ~ 


E Pk E Pi k 

s k eS t s{€Sj 


E Pk 

Sk£Si 


( 1 ) 


This is the classical aggregation matrix. Note that the matrix Q is a function not only of the fine 
level matrix P, but also of the fine level solution vector p. 

This yields the aggregated equation in the unknown q : 


Qq = 0 


N - 1 

5Z « = 1 ■ 


2—0 


It can then be shown that 

?»■ = p* > 

s k es, 

i.e. the solution q of the aggregated system truly represents a coarser version of the solution p of the 
original problem. The probability of being in state qi is the sum of the probabilities of being in any 
of its constituent fine-level states. We use the aggregation equation as a basis for the multi-level 
method, whereby we approximate the exact solution values Pk in (1) above by values from the 
current iterate. 


3. Adaptive Relaxation 

Gauss-Seidel is an iterative method for the solution of linear systems of equations which proceeds 
by successively solving the local equation for each unknown by modifying the value of the unknown 
to make its residual equal to zero. The Gauss-Seidel method is given in figure 2. Th e standard 
Gauss-Seidel method visits each state exactly once per iteration step. In addition the order in 
which states are visited is fixed and in practice is determined largely coincidentally by the order in 
which states have been generated. In the case of Markov chains derived from generalized stochastic 
Petri-nets (GSPNs) the ordering is a depth-first or breadth-first traversal starting at the state 
representing the initial marking. 

We may consider the effectiveness of the Gauss-Seidel method at any one state Si at any time 
during the computation to be the amount by which the current solution value p, changes when 
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the GS relaxation step is applied to that state only. Thus GS is effective when it is able to 
bring about a large improvement in the state’s value, and is ineffective otherwise. However, if the 
method is converging towards the steady-state solution, then changes in the solution must become 
successively smaller. Thus the effectiveness is a relative measure, meaningful only with respect to 
the effectiveness at other states of the chain at a given instant during the computation. 

Intuition tells us that the effectiveness of GS at any point in time during the solution process 
can vary greatly between different states. In the Markov chain below, for example, there are many 
states for which the effectiveness is initially zero, i.e. application of GS at these states would not 
change the solution at all! Ideally, of course, we would like to apply GS to those states where the 
effectiveness is highest. In this case, the computational effort would be expended with maximum 
efficiency. Conversely we would like be able to pass over states with low effectiveness and not 
spend any computational effort on them. As the computation progresses, the effectiveness of 
states changes as the values of their neighbours are modified. Unfortunately, of course, we do not 
know which states have the highest effectiveness, i.e. the largest residuals, and finding them would 
essentially involve performing GS at every state, thus destroying the very advantage we were hoping 
to achieve. 

We must therefore adopt a different strategy, which is derived from that of Rude [7], who applied 
adaptive smoothing to the solution of partial differential equations. We introduce a set of states, 
called the active set , which is an approximation to the set of states with high effectiveness at the 
current stage of the computation. By only considering states from this set for the application of 
GS, we hope to concentrate our computational effort in the “hot spots” - those areas in which GS 
is able to achieve substantial improvements to the solution. 

The adaptive Gauss-Seidel algorithm is given in figure 3, where M denotes the active set of 
states. Since we do not know initially which are the states with high effectiveness, we are forced to 
initialize the active set M to include all states in the Markov chain. The main loop of the algorithm 
repeats until M has become empty. A state s t to be relaxed is chosen and removed from M and 
its current solution value stored in a temporary variable t . The relaxation is then performed. If 
the change in the solution value exceeds a pre-defined limit c then all states in the chain that are 
directly influenced by state s t are inserted into M. The motivation for this set update strategy 
is that a large change in state s t changes the residual by a proportionate amount at those states 
whose values depend on p t , and thus it is likely that a high effectiveness is induced there. 

Upon termination of AGS, we assume that solving the local equation at any state cannot improve 
the solution there by more than an amount proportional to c. Although this means that little 
improvement can be achieved locally, this of course tells us nothing about the accuracy of the 
solution. We must therefore repeat the procedure with a reduced value of c. We choose a simple 
strategy given by the following algorithm: . 


procedure solve_withJLGS 

e = c 0 

while ||Pp||oo > 6 do 

adapt ive_Gauss _Seidel(c) 
c = c * Ac 


with Ac chosen to satisfy 0 < Ac < 1. 
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As a simple example we consider solving the birth/death chain of figured with a length of n — -M, 
a birth rate of A = 49 and a death rate of p = 50. Setting p = X/p and counting from i = 0, this 
chain has the solution 

_ 1 ~ t -» 

P ' 1 - p nP 

If, as is typically the case in practice, we initialize the solution vector to the constant function 
^ . , , , — then we can observe that the application of GS to any state other than s‘q and $ n -\ W1 H 

have no effect, i.e. the computational effort would be completely wasted. Only at the ends of the 
chain can an improvement be achieved initially. This is because the initial guess solves the local 
equations at every internal state: 

1 M 1\ 1 

A + // V n n) 71 


We show the history of the active set in figure 5, where the nodes of the chain are plotted vertically 
and time horizontally. A cutoff value of c = le — 4 was used. An “X denotes a state currently 
in the active set and a a state not currently in the active set, whilst an “O” shows the state 
currently being relaxed. During the computation, each node is initially visited once, as all nodes 
start out in the active set. However, it can be seen that only at the ends of the chain are nodes put 
back into the set and the computation proceeds to treat only nodes 0 . . .4 and n — 5 . . .n — 1. It 
is important to realize that this would hold regardless of the size of the chain, thus the proportion 
of nodes that would be visited after the initial sweep can be extremely small. Hence this method 
achieves the required result: it only expends computational effort in those portions of the chain in 
which substantial improvements can be achieved. When the active set has emptied, c is reduced 
and the adaptive procedure repeated with the smaller tolerance. Note that we initialize the active 
set to include all states of the chain in order to err on the side of safety; in this particular example 
we could have achieved essentially the same result at significantly reduced cost by initializing the 
active set to just {.s 0 , s n _i}. 

4. Multi-Level Solution Algorithm 

In this section we briefly review the recently introduced multi-level algorithm, details of which 
can be found in [2]. The multi-level algorithm is based on a recursive aggregation of the Markov 
chain to obtain approximations of successively smaller dimensions. The algorithm passes through 
all levels of the hierarchy of chains in a downward-upward sweep. Solutions on finer levels are used 
to construct coarser equations, the approximate solutions of which are used to correct those on the 
next finest level. The coarser level equations are the aggregation equations of section 2. 

We adopt the following abbreviations for vectors a, 6, c G 

d — b ♦ C — G| — bi + Cj, 1 ^ 771 

a = bfc = a, = 6,/cj, 1 < i < m 


The toolevel version of the ML iteration is given by the following sequence of steps. 


• Perform OS relaxation on finer level 

p = GS(pW) 

• Restrict solution to coarse level 

q = R(p) = qi= P k 

s k es, 
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• Compute coarse lovol aggregation matrix 


E h E hk 

**€►*»» <u€$ } 


• Solve coarse equation for q 


A/ - 1 

0'/ = 0 , 7> - I 

1=0 


• Compute coarse level correction 


H = <l/<] 


• Compute fine level correction 


VI) 




V = /('/*) 


Pt- = <h 


• Apply fine level correction 


= p = C(p,p*) = p*p* 

In this two-level form the method is similar to well-known iterative aggregation /disaggregation 
(IAD) methods such as those of Koury, McAllister and Stewart [3] and of Takahashi [ 9 ]. The 
jnulti-hvd algorithm is obtained by recursive application of the two-level algorithm to obtain a 
solution to the aggregated equation ( 3 ) and is described in algorithmic form in figure 6. We use the 
subscript l to denote level of representation (/ = l max finest level, / = 0 coarsest level). The coarse 
level l — 1 and fine level l between which the operators I and R map are identified by appropriate 
indices. Note that, because of the recursive nature of the algorithm, the unknowns <7*, <7 and q are 
represented by the variables Pi- 1 and p/_j , respectively. We allow in general the possibility 
of applying GS v times at each level with v > 1 , denoted by GS" . 

The aggregation strategy used at present is very simple. It attempts to map pairs of fine level 
states that are strongly coupled to a common coarse level state. To achieve this, we loop through 
all states of a given level and for each state .S{ that has not yet been assigned to an aggregated 
state, we choose the unassigned neighbour s 3 for which P 3% is maximal. Any states that remain 
unaggregated by this policy are mapped to an assigned neighbour s 3 for which P JX is maximal. 
Thus aggregated states are almost always composed of fine-level states that are neighbours and 
states with the strongest coupling coefficients are aggregated together. 

Adaptive Gauss-Seidel has a particular relevance when used as the relaxation component of the 
ML algorithm. In order to do this we borrow concepts from the multigrid literature, in particular 
[<S]. We consider the error e t in the current solution value pi in state S{ 

?i = Pi - Pi , 

and differentiate between high and low frequency error components, whereby the highest frequency 
errors are those that oscillate in size between neighbouring states. Low frequency errors are those 
that vary only slowly across the chain. Upon completion of AGS, we can assume that the magnitude 
of high frequency error components is small everywhere, since no substantial improvements in the 
solution can be made locally. Thus we can conclude that the relative magnitudes of all unknowns 


with respect to their immediate neighbours are approximately correct, regardless of the absolute 
size of the error e in those unknowns. 

Consider now the definition of the coarse level matrix in the ML method (2): 

E Pk ( E Phi ) 

x sktSi \sieSj } 

y jt — 

E Pk 


The matrix Q is an approximation to the correct coarse system matrix Q, which is obtained by 
setting p = p above. Since approximate solutions to the coarse system are used to derive corrections 
to the solution at the next highest level, it is clear that the difference between Q and Q may affect 
the behaviour of the ML method. In particular, if Q is a bad approximation, then the coarse level 
correction may be extremely inaccurate. We surmise that it is cases such as this which have led 
to reports by some authors of divergence of iterative aggregation-disaggregation methods such as 
that of Takahashi [9] for some problems. 

The quality of Q, i.e, the size of Q — Q, depends on the quantities 

Pk 

E Pk ’ 

which is the conditional probability of the Markov chain being in state Sk, given that it is in the 
aggregate state 5/. Thus it is not necessary to know the absolute size of p in order to be able to 
compute a correct value for Q, but it is sufficient to know the relative sizes of all fine-level unknowns 
which are aggregated to a common coarse level state. This set of fine-level states is by the definition 
of the aggregation strategy always composed of close neighbours and in most cases is a subset of 
the set of all immediate neighbours of any state. For any fine-level state Sj and coarse level state 
.S’,- for which holds Sj E S t we have as a rule 

{sk : »k € Si} C {si : P 3 i / 0} . 

Thus AGS gives us a means to control the quality of the coarse level matrix by eliminating high- 
frequency errors to a controlled tolerance at a possibly greatly reduced cost compared to GS. 

5. Experimental Results 

Figure 7 shows a multiprocessor system in which the u processors Pv\ . . . Pv n compete foi access 
to two memory units CMl, ('M2 via a common bus. Marsan, Balbo and Conte [4] gi^e a GSPN 
model of this multiprocessor (the structure of which is shown in figure 8) which allows for the 
possibility of failure and repair of the processors, the bus and the memory units. The model 
allows the computation of the effective utilization of the processors in the presence of failures and 
competition for the system resources. 

Figure 9 shows the computational work of the GS, AGS, ML-GS and ML- AGS methods applied 
tu this problem, where ML-GS (ML- ACS) denotes the multi-level method using standard (adaptive) 
( lauss-Seidel as a relaxation component. We show the total number of millions of floating point 
operations used as a function of problem size measured as the number of processors in the model. 
The number of states of the Markov chains varied from 91 (2 processors) to 38X3 (10 processors). 
All problems were solved to an accuracy of ||T/t|| x < !c - 9- 

0 


Comparing GS and AGS, we see that we are able to achieve a substantial improvement via the 
adaptive strategy. For the smallest problem considered, AGS is a factor of about 3.6 faster (not 
. discernible in the figure); for the largest it is about 9.6 times faster. 

In order to compare ML-GS and ML- AGS we magnify the lower section of figure 9, shown as 
figure 10. Here we see that the adaptive technique also improves the multi-level method. Since, 
howevei , the ML-GS method is already very efficient for this problem, needing only between eight 
and ten iterations to achieve convergence, there was little room left for improvement for ML-AGS. 
Both ML schemes are still substantially faster than AGS. 

Comparing the standard GS and ML-GS schemes, we see that although these are problems of 
very small size, the saving in computational effort of ML over GS is quite dramatic: a factor of 39 
for the smallest and of 77 for the largest problems considered. It is also clear that the gap widens 
as the problem size is increased. It is results such as these, see [2] for more examples, that make 
us confident that the multi-level method is a strong candidate as a steady-state solver for Markov 
chains. 

Figure 11 shows the results of the four methods applied to the example stochastic Petri-net in 
the original paper of Molloy [5], the structure of which is shown in figure 12. For this problem, 
the computational work of GS grows extremely fast with problem size, measured by the number 
of tokens k in the initial marking. The number of states of the Markov chain varies from 506 to 
23821. AGS is a distinct improvement, being already factors of 5 and 8 faster for 20 and 30 tokens 
respectively. ML-GS is about twice as fast as AGS throughout and ML-AGS another factor of 2 to 
4 faster still. Thus the overall improvement from the standard scheme to the best new scheme is a 
factor of 40 for 30 tokens and increases as the problem grows larger. 

Figure 13 shows the results of the four methods applied to a model of a processor cluster with 
failures and repairs by Muppala and Trivedi [6] (figure 14). In the model jobs arriving can be pro- 
cessed or rejected, depending on whether the system is down or up. A quorum of active processors 
can be specified, which determines whether jobs can be accepted by the system or not. Enabling 
functions (not shown) are used to define the model’s behaviour. In addition, the size of the buffer 
receiving the jobs can be varied. We chose to scale the size of the problem by varying the buffer 
length between 8 and 64, yielding Markov chains with 81 ... 585 states. For this model, the opera- 
tion count for GS grows sharply, but linearly with buffer size, whereas the other methods only grow 
at a more modest rate. ML-AGS is superior to ML-GS by approximately 30% throughout; both 
are about four times faster than AGS, despite the fact that this is an extremely small problem. 

6. Conclusions 

In this paper we have described and discussed the adaptive Gauss-Seidel method as a variation 
of the well-known Gauss-Seidel solver for Markov chains. We also gave a brief description of the 
multi-level algorithm which was recently introduced in [2] and which has been shown to often re- 
quire significantly less computation time than the standard scheme for a number of test problems. 
Experimental results showed that the introduction of an adaptive strategy can improve the perfor- 
mance of the Gauss-Seidel method by almost an order of magnitude, and that it can also be used 
to advantage as a component of the ML algorithm. 

Possible extensions and modifications are to take the coefficients P tJ into account when deciding 
whether neighbouring states are to be inserted into the active set. This would more accurately 
reflect the change in magnitude of the residual in those states, which would avoid insertion of states 
with low effectiveness and thus lead to further computational savings. One might also consider 
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an adaptive SOR scheme, in which the basic AGS method is modified to allow overrelaxation. 
A problem which we have yet to resolve satisfactorily is the automated choice of values for the 
parameters <? 0 and Ac in the AGS scheme. Alternatively, one might consider a dynamic tuning of 
Co during the computation. 

Further work will include the implementation of a “fully adaptive” multi-level solver in a manner 
similar to Rude [7] - one in which the active set processing crosses the levels of the hierarchy. In 
this way it is hoped that an adaptive relaxation which remains restricted to local areas of the chain 
on one level initializes the active set on neighbouring levels to a corresponding subset of the states 
on those levels. This may lead to substantial savings for the ML-AGS method, as restarting each 
level with a full active set could be avoided. 
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Figure 2: Standard Gauss- Seidel Algorithm 



Figure 3: Adaptive Gauss-Seidel algorithm 
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Figure 4: Birth- Death Markov Chain 
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Figure 5: Active set history 
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Figure 6: Multi-Level Algorithm 



Figure 7: Simple Multiprocessor Example 
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Figure 9: Computational work for GS (A), AGS (B), ML-GS (C) and ML-AGS (D) to solve the 
Marsan/Balbo/Conte problem. 


MFLOPs 



Figure 10: Computational work for ML-GS (C) and ML-AGS (D) to solve the Marsan-Balbo-Conte 
problem (Magnification of part of figure 9). 



Figure 11: Computational work for GS (A), AGS (B), ML-GS (C) and ML-AGS (D) to solve 
Molloy’s example SPN. 
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